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Abstract 

<N 

We consider the task of compression of information when the source of the information and 
Q | the destination do not agree on the prior, i.e., the distribution from which the information 

is being generated. This setting was considered previously by Kalai et al. (ICS 2011) who 
suggested that this was a natural model for human communication, and efficient schemes for 
compression here could give insights into the behavior of natural languages. Kalai et al. gave 
a compression scheme with nearly optimal performance, assuming the source and destination 
share some uniform randomness. In this work we explore the need for this randomness, and 
give some non-trivial upper bounds on the deterministic communication complexity for this 
problem. In the process we introduce a new family of structured graphs of constant fractional 
chromatic number whose (integral) chromatic number turns out to be a key component in the 
analysis of the communication complexity. We provide some non-trivial upper bounds on the 
chromatic number of these graphs to get our upper bound, while using lower bounds on variants 
of these graphs to prove lower bounds for some natural approaches to solve the communication 
QO ' complexity question. Tight analysis of communication complexity of our problems and the 

chromatic number of the underlying graphs remains open. 

*n , Keywords: Source coding, communication complexity, graph coloring 

(N ■ 1 Introduction 



The following example illustrates the questions studied in this paper: Suppose Alice and Bob have 
a ranking of a set U of N elements, say, movies. Specifically Alice's rank function is A : [N] — > U 
5h | and Bob's rank function is B : [N] — > U where [N] = {1,...,N} and A and B are bijections 

with A(i) naming the zth ranked movie in Alice's ranking. Suppose further that Alice and Bob 
know that their rankings are "close", specifically for every x G U, \A~ 1 (x) — i?~ 1 (x)| < 2. How 
many bits does Alice have to send to Bob so that Bob knows her top-ranked movie, i.e., A(l)7 
On the one hand Bob knows A{1) is one of the three element set S\ = {B(l), B(2), B(3)} and so 
the information-content from his point of view is bounded by log 2 3 bits. Indeed this leads to a 
randomized communication scheme, with Alice and Bob sharing common randomness with 0(1) 
bits of communication. However the deterministic communication complexity of the question is 
not as easily settled. Part of the reason is that Alice doesn't know Si and so has to "guess" it to 
communicate A(l). Still she is not clueless: She knows it is contained in T<i = {A(l), . . . ,A(5)} 
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and perhaps this can help her communicate A(l) efficiently to Bob. The question of interest to us 
in this work is: Can Alice communicate A(l) to Bob with a number of bits that is independent of 
iV? (Unfortunately, we do not answer this question, though we do give a non-trivial upper bound. 
We will elaborate on this later.) 

The question above is a prototypical example of "communication amid uncertainty" , where the 
communicating players have fairly good information about each other (in the example above Alice 
and Bob know each others ranking of each movie to within ±2), but are not sure of each other's 
information and do not have a common-ground to base communication on. One way to proceed 
in such settings is for the players to communicate enough information to agree on a common prior 
and then to use classical compression; but this would be excessively wasteful for, say, a one-time 
communication. One could hope for a direct solution which aims to establish communication 
without requiring agreement on the prior, and indeed this was the question studied by Kalai et 
al. [5]. Kalai et al. argue that this models many natural forms of communication among humans 
where humans are uncertain about each other's contexts, but try to communicate efficiently despite 
the lack of a perfect common basis, or without trying to first agree on the prior. They argue 
in particular that this leads to certain phenomena in natural communication systems (natural 
language) that are not seen in carefully designed communication systems (where perfect agreement 
on the prior can be assumed). 

The specific problem they consider is the following. Suppose Alice wishes to communicate a 
message m G U to Bob, where Alice is operating under the belief that the message is chosen 
according to the probability distribution P on U. Bob on the other hand operates under the 
belief that the messages are chosen according to a distribution Q on U. Both players are aware 
that their distributions may not be identical but operate under the "knowledge" that their dis- 
tributions are close. Specifically, we say that P and Q are A-close if for all m G U, we have 
log 2 P(m)/Q(m), log 2 Q(m)/P(m) < A. (We use this to also define our distance between distribu- 
tions: The distance between P and Q, denoted 5(P, Q), is defined to be the minimum A such that 
P and Q are A-close.) The question Kalai et al. investigate is: What is the expected number of 
bits, under distribution P, that Alice has to send to Bob so that Bob can recover the message. (We 
note that similar questions, in the interactive setting, were also studied in the works of Harsha et 
al. [4] and of Braverman and Rao pQ, though their motivations were quite different. Both works 
focus on the setting when sender and receiver have different priors and are trying to generate a 
random variable that is maximally correlated under their priors. In our case the sender gets a 
concrete message from its prior and wishes to communicate it. The focus in both works is on ran- 
domized solutions that get the communication complexity down to the minimum possible amount, 
whereas our thrust is to use less (or no) randomness at the expense of slightly larger communication 
complexity.) 

Without any knowledge of P and Q, it is still trivial for Alice to communicate m with log N bits. 
On the other hand, if A = (and so Alice and Bob have P = Q), then standard compression can 
communicate this information with H{P) + 0(1) bits (where H{P) = ^me£/ P{m) log 2 (l/-P(m)) 
denotes the binary entropy of P) which may be much smaller than logiV. Kalai et al. show that 
if Alice and Bob share some common random bits, then they can communicate with each other 
with H{P) + 2A + O(l) bits. This gives a graceful degradation of performance when A > 0, and 
indeed in many natural instances of communication where A may be large (say 50), this gives a 
very efficient communication mechanism amid large amounts of uncertainty. 

The assumption that Alice and Bob share a common random string is however a major one, and 
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is unclear how to achieve it in "nature" . This assumption affects the solution both technically and 
conceptually. We discuss the technical implication first. Technically, this assumption is not made 
to alleviate computational complexity considerations, but is rather to overcome a fundamental 
challenge. The randomness is independent of P and Q and so effectively manages to convert a 
solution that works for most pairs of Alice and Bob (or rather their beliefs P and Q) to one that 
works for every pair P and Q, with high probability over the randomness. Unfortunately, any 
attempt to fix the random string leads back to a solution that only works for most pairs of beliefs 
(P,Q) (over any distribution over the beliefs), but not one that works for every pair. Thus the 
technical question that remains open is: "Is there a single solution that will work for every choice 
of P and Q with performance roughly that of Kalai et al.?" 

We now return to the conceptual implications of the assumption of shared perfect randomness. 
In terms of the motivating phenomenon of "natural communication among humans", Kalai et al. 
suggest the presence of a common dictionary (of say English) as presenting such shared randomness. 
They do point out, however, that the assumption that such a dictionary is a random string is mainly 
a convenient technical assumption, rather than an empirically justifiable one. In particular, the 
assumption that our beliefs are independent of the dictionary is not easy to justify. Indeed the 
contrary may well be true: Our dictionary may well be strongly influenced by our beliefs. Thus one 
could ask - can one weaken the assumption on the shared randomness to some much weaker notion 
of shared context? Our work explores this question and gives some partial answers, while also 
highlighting some intriguing communication complexity/graph-theoretic questions that are raised 
by this line of work. 

1.1 Formal definitions and main results 

We start by defining the notion of an "uncertain compression scheme" . 

We let {0, 1}* denote the set of all finite length binary strings. For x G {0, 1}*, let |x| denote 
its length. Throughout U, the set of all messages, will be a finite set of size N. Let V(U) denote 
the space of all probability distributions over U. 

Definition 1.1 ((Basic) Uncertain Compression Scheme). For positive real A an Uncertain 
Compression Scheme (UCS) for distance A over the universe U is given by a pair of E : 
V(U) x U — > {0, 1}* and D : V{U) x {0, 1}* — > U that satisfy the following correctness condi- 
tion: For every pair of distributions P,Q G V(U) that are A-close and for every m G U , we have 
D(Q,E(P,m)) = m. The performance of a UCS (E,D) is given by the function L : V{U) —> $l + , 
where L(P) = E m ^ p jj[\E(P,m)\], i.e., the expected length of the encoding under the distribution 
P. We refer to such a scheme as a (A, L)-UCS. 

In English, the definition above explicitly provides the distribution as input to the encoding and 
decoding schemes, and expect the schemes to work correctly even if the distributions used by the 
encoder and decoder are not the same, as long as they are A-close to each other. While in general 
we would like compression schemes which work for all possible distributions P, Q that are within 
A of each other, and with no error (as expected in the definition above), some of our schemes are 
weaker and work with some error, or only for some class of distributions. We define such general 
UCS's below. 

Definition 1.2 ((General) Uncertain Compression Scheme). For positive real A (for distance), 
e G [0, 1] (for error), a class of distributions J- C V , and performance function L : T — )• !R + 
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a (A, e, J 7 , L)-Uncertain Compression Scheme (UCS) over the universe U is given by a pair of 
E : T x U ->• {0, 1}* U {_L} and D : T x {0, 1}* U {_L} ->• U U {_L} t/wrf safe/y £/ie following 
conditions: 

1. For every pair of distributions P,Q £ J 7 that are A-close and for every m € U, it is the case 
that if E{P,m) / _L then D(Q,E(P,m)) = m. Furthermore D(±) = _L. 

2. Pw p[/ [E(P,m) = ±] < e. 

3. For every P £ T , we have B m ^ pU [\E(P,m)\] < L(P). 

Note that we do not distinguish the two definitions above by name, but rather just by the 
number of parameters. So if the number of parameters is just two, then it is assumed that there is 
no error, and the performance holds for all distributions. 

We note that the definitions above only covers deterministic compression schemes. A compres- 
sion scheme with shared randomness can be defined analogously, but we don't do so here. We 
also stress that the choice of P and Q is "worst-case" within the family T (as formalized by the 
universal quantifier in the correctness condition). There are no assumptions that T is small (has 
only finitely many elements), which tends to be the setting for universal compression. Similarly, 
we do not consider a sequence of messages that need to be transmitted: Rather, we are considering 
one-shot communication with no assumptions on the distributions P and Q, other than that they 
are from T and A-close. 

We recall that Kalai et al. present a (A, H(P) + 2A + c)-UCS (with shared randomness) for 
some constant c < 3. We give two deterministic schemes in this paper, both having complexity 
depending on N, but both using substantially less than log A bits. 

Theorem 1.3. For every A > ; there exists a (A,0(H(P) + A + log log N))-UCS, i.e., a deter- 
ministic universal compression scheme that works for all pairs P, Q that are within distance A of 
each other, and where the expected length of encoding is at most 0(H(P) + A + log log A). 

The dependence on iV of this scheme is non-trivial and thus may even be reasonable in "natural 
circumstances" . However it is not clear if such a dependency on iV is necessary. Motivated by the 
quest to understand the dependence on N more closely, we explore schemes whose performance 
is not necessarily linear in H(P). Simultaneously we relax our schemes to allow them to "drop" 
messages with e probability. We note that if we don't do the latter, then the former is not really a 
relaxation: Any error-free scheme with superlinear dependence on H{P) can be converted to one 
with linear dependence on H{P) by a simple reduction (see Lemma l3.14p . 

Our next theorem gives a scheme that is weaker than the one from Theorem ll.3l in its dependence 
on the entropy H{P) and in that it errs with non-zero probability. But it does achieve significantly 
better dependence on N. 

Theorem 1.4. For every e > and A > there exists a (A, e,V(U), exp (H(p)/e + Alog* N))- 
UCS, i.e., the scheme has error probability at most e, it works for all pairs of distributions P,Q 
within distance A and the expected length of the encoding is at most exp (H(p)/e + Alog* N). 

In the above the notation exp(x) denotes a function of the form c x for some universal constant 
c, and log* denotes the minimum integer i such log^ N < 1 and log^ is the logarithm function 
iterated i times. 
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An alternate way to get around the barrier of Lemma 13.141 which insists that schemes must 
have linear dependence on H{P) or make some error, is to have schemes that do not work for 
all possible pairs of distributions P and Q. As it turns out the scheme from Theorem 11.41 does 
have this behavior for many natural distributions. In Theorem 13.81 we show that our scheme from 
Theorem 11.41 works without error and with same performance as long as P (or Q) are close to a 
"flat distribution" (uniform over a subset), or a geometric distribution, or a binomial distribution. 
We stress that the scheme is not particularly carefully tailored to the class of distributions (though 
of course the encodings and decodings do depend on the distributions), but naturally adapts to 
being error-free for the above classes. 

1.2 Techniques: Graph Coloring 

While the most natural framework for studying our problem is as a question of communication 
complexity of a relational problem (as in [6]), this turns out not to be the most useful for studying 
the deterministic communication complexity. Indeed, as pointed out earlier, the modern stress in 
communication complexity is often on designing and understanding the limits of protocols that are 
interactive and use shared randomness, while in our case the thrust in the opposite direction. 

It turns out our questions are naturally also captured as graph-coloring questions. Furthermore 
such questions (or related ones) have been studied in the literature on distributed computing in 
the attempt color graphs in a local distributed manner. In particular, the work of Linial [7] shows 
that a "local" algorithm for 3-coloring a cycle, due to Cole and Vishkin [2], implies that a large 
"high-degree graph" is 3-colorable. The ideas of Cole and Vishkin [2] and Linial [7] turn out to be 
quite useful in our context. Our work abstracts some of these techniques, and extends them to get 
combinatorial results, which we then convert to efficient compression schemes. 

Uncertainty graphs and Chromatic number We start by defining a class of structured 
combinatorial graphs whose chromatic number turns out to be central to our problems. Let [N] = 
{1, . . . , N}. Let Sn denote the set of all permutations on N elements, i.e., the set of all bijections 
from [N] to itself. For it, a G Sn, let 5{i:,a) = max iG [jv] |7r _1 (i) — a~ 1 (i)\. 

Definition 1.5 (Uncertainty graphs). For integer N, I the uncertainty graph Un,£ has as elements 
of Sn ols its vertices, with ir <— a if (1) vr(l) ^ a(l) and 5(ir,a) < t. 

It turns out that the chromatic number of the uncertainty graphs have a close connection to 
uncertain communication schemes. Roughly these graphs emerge from a very restricted version 
of the communication problem, where the distributions P and Q are geometric distributions (giv- 
ing probability proportional to f3~ n W and j3~ a W to the element i € [N]. It follows that if 
5(tt, a) is small, then P and Q are close to each other. Furthermore, for simplicity these graphs 
only consider the case that the message is the element with maximal probability under P. To 
understand how the chromatic number plays a role, fix a receiver with distribution Q and consider 
two possible senders P and P' that could communicate with this receiver. Consider coloring P 
and P' by E(P, argmax m {P(m)}) and E(P', argmax m {P'(m)}) respectively. This would lead to 
distinct colors on pairs P and P' that are too close to each other, provided their messages, i.e., 
argmax m {P(m)} and argmax m {P'(m)} are different. This exactly corresponds to adjacency in our 
graph: the underlying permutations 7r and a are close, and the top ranked elements are different. 
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The results of Kalai et al. imply that the "fractional chromatic number" of Un,£ is bounded 
by O(£)0 The (integral) chromatic number on the other hand does not immediately seem to be 
bounded as a function of i alone. The implication of the low fractional chromatic number is that 
the chromatic number of Un,£ is at most 0(£N log N), but this is worse that the naive upper bound 
of N, which can be obtained by setting the color of 7r to be 7r _1 (l). (By definition of adjacency this 
is a valid coloring.) Our main technical contribution is in obtaining some non-trivial upper bounds 
on the chromatic number of this graph. 

To derive our upper bounds, we look at "coarsened" versions of the graph Un,h- For positive 
integer k, we say that it : [k] — > [N] is a k-subpermutation if ir is injective. We let SV,fc denoted the 
set of all fc-subpermutations on [N]. For k' > k, we say subpermutation 7r : [k] — > [N] extends the 
subpermutation a : [k'] —> [N] if a(i) = for all i £ [k]. For /c-subpermutations ir and a, we let 

6(n,a) = min^/ extending 7r,a{S(n' , a')}. 

Definition 1.6 (Restricted Uncertainty graphs). For integers N,£ and k the ^-restricted uncer- 
tainty graph UN,i,k has elements of SV.fc as its vertices, with ir <— a if (1) 7r(l) ^ cr(l) and 
5(7T,a) < t. 

Note that Un,£,n = Mn,£- We derive our upper bounds on the chromatic number of Un,£ by 
giving non-trivial upper bounds on the chromatic number of Ujy^^- 

Lemma 1.7. 1. For every k < k', x(^N,e,k') < x{^N,i,k)- 

2. For every N,i, x&N^t) < 0{P\ogN). 

3. For every N , I and k that is an integral multiple of I, we have x{^N,e,k) < <9(2 fc log( fc// ^ N). 
4- For every N, I and k that is an integral multiple of t, we have x(^N,£,k) ^ log^ 2k ^\N/£)). 

As an immediate application we get the following theorem. 

Theorem 1.8. For every N and I, we have x(^N,e) < O (min{^ 2 log N, 2 e lo s* N }) . 

Unfortunately, the lower bound from Part (4) of Lemma [1.71 goes to as k — > N and so we don't 
get a growing function of N as a lower bound. However, it does rule out most natural strategies 
for coloring U, and shows limitations of the intuition that suggests Li may be colorable with /(£) 
colors independent of N. This is so since the intuition as well most natural strategies only use 
the top 0{€) ranking elements of a permutation ir to determine its color; and such strategies are 
inherently limited. In particular, it shows that there is no hope to extend the methods of Kalai et 
al. in a simple way to get a deterministic UCS. 

Organization of this paper. We start with the analysis of the chromatic number in Section [2j 
We then use the methods to build uncertain compression schemes in Section [3j 

1 The fractional chromatic number of a graph G is the smallest positive real w such that there exists a collection 
of independent sets 7i, . . . , It in G with weights wi, . . . ,Wt such that Ylj=i w i ~ w an d f° r every vertex u £ V(G) it 
is the case that Y) . T _ w-i > 1. 
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2 Uncertainty Graphs 



We start with some elementary material in Section 12. II that already allows us to prove Parts (1) and 
(2) of Lemma ll.71 The lower bound mentioned in Part (4) of Lemma l 1 . 71 follows also relatively easily 
from a result of Linial [7] and we show this in Section 12.21 Our main contribution, in Section 12.3} 
gives the upper bound from Part (3) of Lemma 11.71 

2.1 Preliminaries 

We recall the concept of a homomorphism of graphs: For graph G = (V,E) and G' = (V',E'), we 
say that (j> : V — >• V is a homomorphism from G to G' if (u, v) G E =^ (cj)(u), (j)(v)) G E'. We say 
G is homomorphic to G' if there exists a homomorphism from G to G' . 

Proposition 2.1. For every N , t > 1 and k' < k < N, the k-restricted uncertainty graph Uw/k is 
homomorphic to the k' '-restricted uncertainty graph UN,tk'- 

Proof. We construct the homomorphism cfi from Ujytk to Une V as follows: For ir = 
(tt(1), . . . ,ir(k)) G SN,k let 4>(tt) = (vr(l), . . . , Tr(k')) G Sjy,k'- From the definitions it follows that 
this is a homomorphism. □ 

Proposition 2.2. . For every G and G' such that G is homomorphic to G' , we have < x(G'). 

Proof. Follows from the composability of homomorphisms and the fact that G is fc-colorable if and 
only if it is homomorphic to K^, the complete graph on k vertices. □ 

Part (1) of Lemma 11.71 follows immediately from Propositions 12. l l and 12.21 

Proposition 2.3. For every N, t, and k > i + 1 the fractional chromatic number of the restricted 
uncertainty graph Uj^ik ^ s a t most At. 

Proof. For every function / : [N] — > [2t] we associate the set If = {ir G Sjv,fc|/(7r(l)) = 
land /(7r(y))^l V j G {2, ...,£+ 1}}. 

We claim that If is an independent set of U.N,t,k for every /. To see this consider an edge (tt, a) 
and suppose 7r G //. Then a(l) G {vr(2), . . . , ir(t + 1)} and so /(cr(l)) ^ 1 and so a G" //. 

Next we note that for every tt, the probability that tt £ If for / chosen uniformly at random is 

1/(20 • (1 " 1/(20)' > 1/(40- 

Thus if we give each If a weight of At/ (2£) , then we have that the weight of independent sets 
containing any given vertex tt is at least one, while the sum of all weights is At, thus yielding the 
claimed bound on the fractional chromatic number. □ 

The following is a well-known connection between fractional chromatic number and chromatic 
number. 

Proposition 2.4. For every graph G, x(G) < Xf{G) • ln|V(G)|. 

We are now ready to prove part (2) of Lemma ll.71 
Lemma 2.5. X {U N/ ) < x(Un,£,£+i)) < 4^(£+ l)lniV 

Proof. The first inequality follows from Propositions 12.11 and 12.21 The second one follows from 
Proposition 12.41 and 12.31 and the fact that Un&i+i has at most N i+1 vertices. □ 
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2.2 Lower Bound on Chromatic Number 

We now prove Part (4) of Lemma ll . 71 giving a lower bound on x(^N,i,k)- We use a lower bound on 
a somewhat related family of graphs due to Linial [7J. 

Definition 2.6 (Shift graphs). For integers N and k < N , we say that tt G $jvjt is a left shift of 
a G <SV,fc if ^(i) = °~(i + 1) f or i £ [k — 1] cmd n(k) ^ o~(l). We say tt is a right shift of a if a is a 
left shift of tt, and we say tt is a shift of a if tt is a left shift or a right shift of a. For integers N 
and k, the shift graph 5jv,fe is given by V(«Sjv,fc) = <SV,fc with (tt, a) G E(S^,k) if tt is a shift of a. 

Theorem 2.7 (Linial Proof of Theorem 2.1]). For every odd k, x($N,k) > log^"^ N. 

(We note that the notation in [7] is somewhat different: The graph 5jv,fe is denoted Bzv,t for 
t = (k- l)/2 in |7J.) 

We show that the uncertainty graphs contain a subgraph isomorphic to the shift graph. This 
gives us our lower bound on the chromatic number of uncertainty graphs. 

Lemma 2.8. For every N, £ and k that is an integral multiple of £, we have xi^Nik) > 

(i og (r2*/i)(jv/*)). 

Proof. First without loss of generality we only consider the case of even t. Then we reduce to the 
case £ = 2, by considering only those permutations tt which fix Tr(i) = i if 1/2 does not divide 
i. This still leaves us with 2N/£ unfixed elements and subpermutations from S2N/t,2k/eii that are 
within distance 2 of each other are within distance I when mapped back to SV.fc. 

So we assume t = 2 and show that Ujy,2,k contains a subgraph isomorphic to the shift graph 
<Sjv,fc- Consider the map 4> from V(Sw,k) to V(U]y,2,k) which send tt = (vr(l), . . . ,Tr(k)) to 4>{tt) = 
a = (cr(l), . . . ,cr(/c)) as follows: Let t = \k/2\. Then a(2i) = Tr(t + i) and o(2i + 1) = ir(t — i) 
a(t + i) = tt(2i) and a(t —i) = Tr{2i + 1). It is easy to verify that the map is a bijection and if tt 
and tt' are shifts of each other, then 0(7r) and 4>(tt') are within distance 2 of each other. It follows 
that U^,2,k contains a copy of Sj\r,k and so x(^N,2,k) > x(SN,k) > log^ k ~ l ^N . □ 

2.3 Upper Bound on Chromatic Number 

In this section we give an upper bound on the chromatic number of the uncertainty graphs. We first 
describe our strategy. Fix iV and I. Now for every k, we know that there is a homomorphism from 
^N,£,k to UN,i,k-i- However we note that if we jump from U]yi,k to U.N,t.k-i then the homomorphism 
has an even nicer property. To describe this property, we introduce a new parameter associated with 
the homomorphism from V(N,e,k to Un^^-i- Let us denote this homomorphism <pk- For tt S Sn^ let 
dk{^) = \{ ( Pk(o') | (tt, cr) G E(UN,£,k)}\- Note that dk(ir) is independent of tt and so we just denote 
it dk- We note first that dk is small. 

Recall that 4>k : <SW,fc — > S]v,k-£ and maps tt : [k] — > [N] to n' : [k — £] — > [iV] by setting 
ir'(i) = Tr(i). 

Claim 2.9. For every k, d k < (2£ + if. 

Proof. Let (a, tt) G E(UN,i,k) then <5(<r, 7r) < In particular for every i G [k — £], we have there 
exists j(*) G {—£, ■■■,£} such that <r(z) = Tr(i+j(i)). Thus the sequence j(l), ■ ■ ■ ,j(k—£) completely 
specifies 4>k(o~)- Since the number of such sequences is at most (2£ + l) k ~ e , we get our claim. □ 
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The next lemma shows that a homomorphism with a small <i-value yields especially good col- 
orings. 



Lemma 2.10. Let <j) be a homomorphism from G to H and let c = x{H) and d = 
max v€V(G) \{4>(w) | (v,w) G E(G)}\. Then X (G) < 2d(d + 1) log c = 0{d? log c). 

Proof. For integers t and M, we start by building a small family of hash functions T~L = 
{hi,...,IiM} C {h : [c] — > [t]} with the property that for every subset S C [c], with |5| < cZ, 
and for every i G [c] — 5, there exists j G [M] such that hj(i) G 5 1 }. 

Given such a hash family, we claim there is a coloring of G with i • M colors. To get such a 
coloring, let x' be a coloring of -ff with colors [c]. Now, consider i> G V(G) and let 5„ = {x'( ( / ) (' u; )) I 
(w,v) G E(H)}. By the definition of d, we have [S^l < d. Also since x' is a coloring of 
and <p is a homomorphism, we have x'i&iv)) Thus by the property of H, we have that 
there exists a j = j(v) such that hj(x '((f>(v)) g" G S v }. We let the coloring x of G be 

x(v) = (j(v),hj( v -)(x'(,4'( v ))}- Syntactically it is clear that this is a t ■ M coloring of G. To see it 
is valid, consider (v,w) G E(G). If j(v) ^ j(w) then we are done. Else, suppose j(v) = j(w) = j. 
Then by definition of S v we have x'(</>(w)) G S v and so hj(x'( v )) hj(x'( w )) ^ {^'WN ^ an d 
thus x(^) 7^ x( w ) as desired. 

To conclude we need to give an upper bound on t and M. 

Claim 2.11. There exists such a hash family with t < 2d and M < log(c rf+1 ). 

Proof. The proof is an elementary probabilistic method argument. Let t = 2d. We pick members 
of TL at uniformly at random from {h : [c] — > [t]}. Fix a set S with |5| < d and i G [c] — S. Say 
that h separates i from S if h(i) {h(i')\i' G 5}. The probability that a random h separates i 
from S is at least 1/2 and the probability that there does not exist h G "H separating % from S is 
at most 2~ M . The probability that there exists S and i G [c] — S 1 such that there does not exist 
h G % separating z from 5 is strictly less than c d+1 ■ 2~ M . It follows that if M = logc rf+1 then such 
a family % exists. □ 

The lemma follows. □ 

We are now ready to prove Part (3) of Lemma 11.71 restated below. 

Lemma 2.12. There exists a constant c such that for every N,£,k, we have X^Nlk) — 
2«*iog/ lo g(L(fc-i)/<J-i) N _ 

Proof. We prove the lemma by induction on k. For notational simplicity assume k — 1 is a 
multiple of I. For k < t the lemma is immediate from the fact that X&Nti) ^ N. As- 
sume the lemma is true for k — t. Then, by Lemma 12.101 we have that for x(P^N,e,k) < 
24(4 + 1) • log(x(UNAk-e)) < 44logx(^,fc-£). By Claim EH 4 < (2£ + if < {U) k and 
so for x(UN,£,k) < 4(4£) 2fc log(2 c ( fc - £ ) lo ^log (fc -^ 1)/e " < 2 cfclo s £ log (fc_1)/€) N for a suitably large 
c. □ 

3 Uncertain Communication 

We now convert some of the methods from the previous section into schemes for uncertain com- 
pression. In Section 13.11 we derive a simple compression scheme based on the relationship between 
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fractional chromatic number and chromatic number from Section 12.11 We then use the "nested 
series of homomorphisms" from Section 12.31 to derive a second compression scheme in Section 13.21 
The compression scheme of Section 13.21 can make errors with positive probability and has a non- 
linear dependence on entropy. In Section 13.31 we show that for some natural distributions, this 
scheme is error-free. In Section [3.41 we show how an error- free scheme working for all distributions 
would automatically have linear dependence on the entropy, suggesting some of the weaknesses in 
Section 13.21 are necessary. 



3.1 A simple, zero-error compression scheme 

Our first construction uses the notion of an isolating hash family. For positive integers £, N and 
m G [N] and S C [N] — {m}, we say that a function h : [N] — > {0,1}^ isolates m from S if 
h(m) G" {h(rn')\m! G S}. We say that a hash family Tie = {hi/, . . . , Hm,i\ is {N ,£) -isolating if for 
every S C [N] with |S*| < 2 , and for every m G [N] — S, there exists j = j(m,S) such that 

hj/(m) G" hjj(S) = {hj ;i (m')\m' G S}. 

We note first that small isolating families exist and then give a compression scheme based on 
small isolating families. 

Lemma 3.1. For every £ and N, there exists an {N ,£) -isolating family of size at most 2 l • logiV. 

Proof. The proof is straightforward application of the probabilistic method. We pick % = 
{h±, . . . , /jm} by picking hi uniformly and independently from the set of all functions from [N] 
to {0, 1} . Fix m S C [N\. The probability that a randomly chosen h isolates m from S is at 
least 1/2. Thus the probability that some hi in H does not isolate m from S is at most 2~ M . 
Taking the union bound over all m, S we find that the probability that % does not isolate some m 
from S is at most iV 2 /2 M . We conclude that M < 2 e • log N suffices for the existence of such a 
H. □ 

We are now ready to describe our encoding and decoding schemes. 
Encoding: Given m,P let S = {m! G [N] \ {m} \ P{m!) > P(m)/2 2A } and let £ = log 2 l/P(m) + 
2A. Let H be an (iV, -^-isolating family of size M and let % = {h\ g, . . . , h^ t\- Now let j G [M] 
be such that hjg(m) {hjg(m') \ m' G 5}. The encoding E(P,m) is defined to be (j,hj g(m)). 

Decoding: Given Q and y = (j,z) G Z + x {0,1}*, let £ = \z\ and let m = 
avgmax m£[N] . hj e{m)=z {Q(m)}. The decoding of the pair Q,y) is given by D(Q,y) = rh. 
Our next proposition verifies the correctness of the compression scheme. 

Proposition 3.2. For every pair of distributions P, Q such that 5(P,Q) < A, and for every 
message m G [N], it is the case that D{Q, E(P,m)) = m. 

Proof. Fix P, Q and m such that 5{P,Q) < A. Let E(m,P) = (j,z) with £ = \z\ and let 
D{(j, z),Q) = rh. We will show that rh = m. By definition of E, we have hj t t(m) = z and by 
definition of D we have hj i(rh) = z. Thus, by the condition that rh maximizes probability under 
Q of messages satisfying hji{m!) = z, we have Q(rh) > Q(m). Since the distance of P and Q is 
at most A, we have P(m) < Q(m)2 A and P(rh) > Q(fh)/2 A . Combining the inequalities we get 
P(m) > P(m)/2 2A . Now let S = {m 1 G [N] - {m} \ P(m') > P(m)/2 2A }. We have rh G 5U {m}. 
But by definition of j, we have hj^(m) G" {hj £(mf)\m' G S} and since hj^(m) = hj^(rh), we must 
have m = rh. □ 
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Finally we analyze the performance of our scheme. 

Lemma 3.3. The expected length of the encoding E is 0(H(P) + A + log log N). 

Proof. Fix m £ S. Then we have £ < 1 + log 1/P(m) + 2A and M < (2 e log N). Thus, the length 
of E(P,m) is at most 2£ + loglogiV = 0(logl/P(m) + A + log log N) . Taking expectation over m 
drawn from P, we have the expected length of the encoding is at most 0(i7(P) + A+loglog N). □ 

Theorem 11.31 follows immediately from Proposition 13.21 and Lemma 13.31 
3.2 Compression with error in the low entropy setting 

Our compression for the low entropy setting (with better dependence on N) relies on an extension 
of our coloring scheme for the uncertainty graphs. We describe this extension in the next section 
and then use that to present our compression scheme afterwards. 

3.2.1 Compression for chains 

We start with some terminology. We say that a finite sequence of sets Aq , . . . , Ak with Ai C N is 
a chain in [N] if |^4o| — 1 an d Ai C Ai + \ for every i. We say that w is the leader of the chain if 
Aq = {w}. We use Chain(iV) to denote the set of all chains in [N]. 

In this section we will show how to compress the leader of a chain so that it is unambiguous 
relative to "nearby" chains. This is in the spirit of the coloring of uncertainty graphs. Indeed vertices 
of the uncertainty graph Wjv,^,fc correspond to chains with the vertex (n(l), . . . , ir(k)) corresponding 
to the chain A with Aq = {ir(l)} and Ai = {vr(l), . . . , n(£ ■ i)} for i > 1. The compressing scheme 
will thus be similar to the coloring scheme, however there are two distinguishing factors: We will 
want to compress some chains more than others - a notion that would correspond to asking some 
vertices to use small colors while allowing others to use larger ones. Furthermore our chains will 
now grow arbitrarily fast (and not just in steps of 1 or more generally £). We now describe the 
precise problem. 

For a chain A = (Aq, . . . , Ak) we say the length of the chain, denoted lgt(.A), is the parameter 
k. We use sz(.A) denote the size of the final set \Af-\- For a chain A of length at least i, we let Ai 
denote its prefix of length i, i.e., Ai = (Aq, . . . , Ai). 

For chain A = (Aq, . . . ,Af.) and chain B = (Bq, . . . , B^-d), we say B is within distance d from 
A if for all i £ {0, k — d}, A^d C8;C Ai + d (where we consider sets with negative index to be 
the empty set). We denote the set of all chains that are within d distance from A by S d (A). Our 
goal next is to compress the leader of chains so that the length of the compression is small as a 
function of sz(^4), while it remains unambiguous to chains that are nearby. 

Lemma 3.4. There exists a coloring scheme Col : Z + x Chain(iV) — > Z + with the following 
properties: 

1. Iflgt{A) > 2k, then for every s > sz(A 2 k), Col(s,^ 2fc ) < 2 6 ( s+1 ) log (fc) N. 

2. Let A and A' be chains of the same length, with lgt(^4) > 2k and of size at most s. Then, if 
S 1 (A) n S^A) ^ and Aq ^ A' , then Col(s,^ 2 fc) + Col(s, A' 2k ). 



11 



Proof. Let c k)S = 2 6 ( s+1 ) log (fe) N. Fix s > sz(A). We now describe a coloring scheme of a chain 
A 2k with Cfc, s colors, using induction on k. 

For the base case k = 0, Let id be the leader of A. Then „4o gets the color Col(s,„4.o) = w, so 
clearly Col(s, A ) <N = log (0) N. 

For A: > 1, let £ = 2.5s and let T~L be an Cfc_i jS )-isolating family (where isolating families 
were defined as in Section [3.1j) . By Lemma 13.11 such a family of size M = 2^1ogCfc_i iS exists, so 
let U = {hi}f =v Let T = {B\\gt{B) = 2k - 2,B G S 2 (A 2k ), Col(s, B) / Col(s, „4 2fc „ 2 )}. Let 
j G [2 £ logc fc „i iS ] be such that hj (Col(s,A 2k -2)) + hj (Col(s, B)) for all B G T. With these 
definitions in place, we define Col(s,A 2k ) to be (j,hj (Col(s , A 2k - 2 ))) . We verify below that this is 
a "small" coloring and a valid one. 

Let us identify the set [2 e log Cfc_i >s ] x {0,1}^ with [2 2i log Cfc„i iS ] . The bound on c kjS follows 
from the fact that 

2 2e logc fc _ ljS 

< 2 5s log (^2 6 ( s+1 ) log^" 1 ) AT) 

< 2 5s (6(s + 1) + logM N) 

< 2 6 ( s+1 > log (fc) N, 

where the final inequality follows from the fact that 2 s • 2 6 > 6(s + 1) which is true for every s > 0. 

We now verify that the coloring satisfies the requirement in Part (2) of the lemma statement, 
i.e., that for chains A and A' of the same length and size at most s, if their prefixes have the 
same colors, then they have the same leader. Again we proceed by induction on k. Assume 
Col( s ,^ 2fc ) = Col( S ,^ 2fc ). 

For k = 0, by assumption we have Col(s,*4o) — Col(s,A' ). But by definition Col(s,^4o) — w 
where w is the leader of ^4o- It follows thus that w is also the leader of A' as claimed. 

Now consider k > 1. Let Col(s,^4 2 fc) = (j, hj(Col(s,A 2k - 2 ))) and Col(s,A' 2k ) = 
(j',hji(Col(s,A' 2k _ 2 ))). Since Col(s,.A 2fc ) = Col(s, A2 k ), we have j = j' . Moreover, 

^(Col(s,^ 2 fc-2)) = M Col (^2fc- 2 ))- 

We now show that A' 2k _ 2 G S 2 (A2 k ). Let B G S 1 (A) n S 1 (A') and consider its prefix 
{B , . . . ,B 2k -i). So, for every i G {0, ...,2k - 1} 

Ai-! CBjC A i+1 and A'^ C B, C A! i+l . 

In other words, for all i G {0, 2k — 2} 

A,_ 2 C Bi-i C ^ C 5,+! C ^ l+2 , 

Hence -4 2fc _ 2 G ^(^fe)- 

From our choice of j, /ij(Col(s, -4 2 fc_ 2 )) = hj(Col(s, A' 2k _ 2 )) for «4. 2 fc-2 G S 1 (^.2fe) only if 
Col(s, A' 2k _ 2 ) = Col(s, «A 2 fe- 2 )- For conclusion, A 2k -2 an d «4 2 fe-2 are b°th chains of size at most s 
of the same length, and have the same color. From the induction hypothesis they have the same 
leader. □ 

3.2.2 The Compression Scheme 

We are now ready to define our final compression scheme. 
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Encoding: Given m,P define r = [— log P(m)\ and / = 2 [log* iVj — 1. Further define the 
chain A of length / as follows. Aq = {m} and Af. = {m! G [N] \ | log \/P{m!) — r\ < A + 1} (so 
that Ak is the set of messages of probability roughly P(m) with the difference in logarithms being 
at most (A; + 1)A + 1). Let s = sz(^4). The encoding E\ ow (P,vn) = E(P,m) is 



(We assume that s and r above are encoded in some prefix-free encoding, so that the receiver can 
separate the three parts.) 

Decoding: The decoding function D\ ow (Q,y) = D(Q,y) works as follows: If y = J_ 
then the decoder outputs _L. Else let y = (s,r, c) and let / = 2|_log*./Vj — 1. Let B = 
{B , . . . , Bf_i) be as follows: Bq = {w} such that \logl/Q(w) — r\ < A + 1. For k > 1, 
Bk = {ml | | log 1/Q(m/) — r\ < (k + 1)A + 1}. Find a chain A' with the following properties: 
B G S^A), lgt(A) = f, sz(A) < s and Col(s,A) = c. Let m be the leader of A. The decoding 
D(Q,y) is set to be m. 

We first analyze the correctness of the decoder. 

Lemma 3.5. For every pair of distributions P, Q such that 5(P,Q) < A and for every message 
m G [N] such that E\ ow (P,m) ^ _L, it holds that D\ ow (Q, E\ ow (P,m)) = m. 

Proof. Fix P G V([N]) and a message m G [N] such that E\ OVJ {P,m) ^ _L. The following claims 
will show that the decoding process is well defined (and then correctness will be essentially be 
immediate) . 

Claim 3.6. There exists w G [N] such that \ log 1/Q(w) — r\ < A + 1. 

Proof. By our choice of r, we have |logl/P(m) — r\ < 1. Now using 5(P,Q) < A, we have 
| log 1/P(m) — log 1/Q(m)\ < A, and so | log 1/Q(m) — r\ < A + 1. Sow = m gives an element in 
[N] with the desired property. □ 

Thus the chain B is now well-defined. It remains to show that there exists a chain A satisfying 
the required properties. The next claim shows that B G S' 1 (^l), therefore A is a candidate for the 



Claim 3.7. B G S^A) . 

Proof. The proof follows easily from our choice of A, B and the fact that P and Q are A-close. Let 
k G {0, / — 1}. We need to show that Bk is sandwiched between A^-i and A^ + \. 

First, We will show that B^ C -Afc+i- When k = 0, we need to show that w G A±. Indeed, 





otherwise. 



+2Alog* JV+l 



role of A. 



|logl/Q(u;) -r\ < A + l 
=> |logl/P(w) -r| < 2A + 1 
w G Ai . 



Now consider 1 < k < / — 1. We have, 



Bk 



= {rri G [N] | | log l/Q(m') — r\<(k + 1)A + 1} 
C {m' G [iV] | | log 1/P(m') - r| < (fe + 2)A + 1} 
= Ak+i . 
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This shows that B\~ C Ak+i- Next we show that A^-i C for 2 < fe < / — 1. We have 



= [m! e [N] | |logl/P(m') -r\ < feA + l} 

C {m' £ [iV] | | log 1/Q(m') - r| < (fc + 1)A + 1} 

= Bk . 

The case where k = 1 and w £ B± was proved in Claim 13.61 So we are done. 



□ 



To conclude, the decoder can find a chain A' such that sz(A') < s, lgt(.4') = lgt , Col(s, A') = 
Col(s,^4) and there exists a chain B G S 1 ^') n S 1 ^). From Lemma [3^41 the leader of A' is m as 
required. □ 

We are now ready to prove Theorem 11.41 

Proof. We now estimate the probability that the encoder will fail. Fix some probability P and a 

H(P) 

message m such that E(P,m) = _L. We will first show that P(m) < 2 e . Later, we will bound 
the probability that "m has such small probability" by e. 

Consider the chain A = (Aq, . . . , Af) as defined by the encoder. In this case, the size of 

the largest set, \AA, is more then the threshold T = 2 « ) + 2A1 °g JV+i_ g Q ^ there is some ele- 
ment m' e A/ such that P(m') < By our choice of A f , P(m') > 2-L-logP(m)J-(/+i)A-i > 
P(m)2- 2A1 °s* Ar - 1 . Calculating, 



1 



> P(m)2- 2Al °z* N ~ l 



2 2Alog*iV+l H(p) 
p( m ) < _ = 



gg) 



Therefore, we can bound the failure probability by the probability that P(m) < 2 e . Using 



the fact that E. 



log 



P(m) 



H(P), we deduce the following by Markov's inequality, 



Pr 

m<- p [TV] 



P(m) < 2 



H(P) 



Pr 



log 



P{m) 



> 



H(P) 



< e 



We will finish the proof by bounding the performance of the scheme. To this end consider a 
distribution P and a message m € [N] such that E(P,m) ^ _L (i.e sz(A) < T). The encoder sends 
r = |_ — logP(m)J, s = sz(A) and Col(s,A). We first analyze the contribution of sending r to the 
performance. Because log |r| = O Uog( p^ )J , the accepted length of sending r in a prefix-free 

encoding is at most O (E m< _ p[N] log(p^y)J = O (H(P)). 

Now we analyze the length of (s, Col(s, A)) . By Lemma 13.41 

C(s,A) < 2 6 ( s+1 ) logM AT = 2°W 

Hence, the length of (s,Col(s,A)) is at most 

0{log s) + log C(s, A) = 0{s) = 2 m f 1+2Al °s*n+0(i) _ 

Thus, from the linearity of expectations, it follow that the total performance is at most 

2 «m+2Alog*n+0(l)_ n 
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3.3 Error-free Compression for Natural Distributions 

In this section we will show that for a large class of natural distributions, the above scheme is error 
free. We start by describing the natural distributions we can capture. 

We say that a distribution P G ^([-^V]) is flat it there exists a set S C [N] such that P is 
uniform on S. The distribution is called geometric if there exists parameter a G (0, 1) and a 
permutation tt on [N] such that for all k G [N — 1] it holds that P(n(k + 1)) = aP(ir(k)). We call 
P binomial if there exists a parameter p G (0, 1) and a permutation 7r on [N] such that VA; £ [N], 
P(ir(k)) = (^)p k (l —p) n ~ k - The sets of all flat, geometric and binomial distributions over [N] are 
denoted by Flat at, Geo at and Bin at respectively. 

The following theorem shows that the scheme (-Ei OW j ^low) performs well without error on all of 
the above natural distributions. Moreover, this theorem is stable in the sense that the guarantee 
on the performance holds even if a distribution is only close to one of the above-mentioned natural 
distributions. 

Theorem 3.8. Let T = Flat^ U GeoAr U Bin N and L(P) = 2 H ^ [Alog* N~\. Then the scheme 
(Ei ow , Aow) (with e set to 0) is a (A, 0, T, O (L(P)))-UCS. Moreover, if P G V{[N]) is Alog*iV- 
close to a distribution P G J- then the performance of the scheme on P is E m< - p jj[\E(P, m)\] = 

o (l{P) 



We prove the theorem above by identifying a broad condition on distributions, which we call 
the capacity, and showing that the performance of our scheme is good if the capacity is small. We 
define this notion next, show that it is small for the distributions under consideration in Lemma [39] 
next, and finally bound the performance as a function of the capacity in Lemma 13.101 afterwards, 
thus leading to a proof of Theorem 13.81 

Let P G V([N]) be a distribution and let S C [TV] be its support. We say that U C S is a unit 
set of P if for any two elements mi,rri2 G U the distance |logP(mi) — logP(m2)| < 1. We define 
the capacity of P, denoted by 'rf(P), to be the minimal c G 3? such that the size of every unit set 
of P is bounded by 2 C . 

Later, we will prove the next lemma, showing that for the previously discussed distributions, 
the capacity is roughly the entropy. 

Lemma 3.9. Let P G Flat at U Geo at U Bin^. Then tf(P) < H(P) + 0(1). 

Theorem 13.81 follows immediately from Lemma 13.91 combined with the following lemma. 

Lemma 3.10. For every P (Ei ow ,D\ ow ) (with respect to e = 0) is a 
(A,0 (log (H(P)) + 2^( p ) [Alog*JV])) scheme. Moreover, if P is Alog* N close to a dis- 
tribution P, then the performance of the scheme on P is O (log (H(P)) + [Alog* N~\ 



Proof. When setting e = 0, the encoder never outputs _L. Lemma [3. 5 1 already implies the correctness 
of the scheme. The only remaining task is to analyze the performance of the scheme. 

Recall, the output of the encoder has three components: r, s and C(s,A). From linearity of 
expectation it suffices to analyze the expected length of each component separately. 



For a given word m G [N], the first component is r 



log 



P(m) 



Its length is 



0(log log p( m ) )■ Using the concavity of the function log we can bound the expectation of \r\ 
as follows: 



E 



log log 



P(m) 



< log E 



log 



P(m) 



log(H(P)) 
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Now consider the chain A with size s and length / = log*(iV) — 0(1) as define by the encoder. 
The second component is the size s. Clearly, \s\ = O(logs). 

The third component is C(s,A). By Lemma[331 C(s,A) = exp(s) so |C(s,„4,)| = 0(s). 

Hence the expected length of the last two components is bounded by O(s). Let P G P([-^V]) be 
a distribution that is A log* iV-close to P. To achieve the results it is enough to show that the size 
s of the chain A associated with P and m is bounded by O (V^) [A log* N]\ 

The size s = sz(„4) is the size of the following set, 

A = {m' £ [N] | |logl/P(m') - r\ < 2 [Alog*iVj + 1} . 

We will show that this set can be covered by 0{ [A log* N] ) unit sets of P. This will yield an upper 
bound on s of O f 2*^) [A log* N~\ ^ as required. 
Let k = 3 [A log* N] + 1. Define U- k , C/ fe _i as 



Ui 



iw! | i < r + logP(m') < i + 1 j . 



Clearly the Us are unit sets of P. Moreover, their union is the set 

k-l 



\JU= | log 1/P(m') - r < 3 [Alog* N] + l\ . 



i=—k 

ifc-1 



Let m! E A. It remains to verify that m! S Ui=— U%. Indeed, 

logl/P(m')-r < |l/logP(m') - r| + Alog*iV < 
< 3 [A log* N] + 1 . 

Therefore, \A\<J2\U\=0 (V^) [A log* N]) as required. □ 



To complete the proof of Theorem l3.8^ we will prove Lemma [3.91 The proof follows immediately 
from the next three claims. 

Claim 3.11. Let P G Flat at. Then <g(P) < H(P). 

Proof. Let S C [N] be the support of P. Clearly, H(P) = log \ S\. For every U C S that is a unit 
set of P, 

\U\ < \S\ = 2 H ^ . 

Thus, ^(P) < PT(P). □ 
Claim 3.12. lei P G Geo at. TTien ^(P) < F(P) + O(l). 

Proo/. Let a G (0, 1) be such that for all k G [JV — 1], P(/c + 1) = aP(k). We will assume that 
a N < \. Otherwise, 

H(P) > log(JV) - 1 > <jf (P) - 1 , 

and we are done. 
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Let U be the maximal unit set of P, i.e \U\ = u = 2^( p >. Let k G U be the element with the 
highest probability in U. From maximality of U we can assume that U = {k,k + 1, k + u — 1}. 
Calculating, 



1 > |logP(fc) -\ogP(k + u- 1)| = (u- 1) log 



1 



a 



Therefore, u = , 1 h +1 = 0(t^— ). To achieve the result it is enough to show that t^— < 2 H ^ + °^ 1 \ 
i.e H(P) > log j-^j — 0(1). Calculating the entropy, indeed, 



H{P) = log 



+ 



1 - Na 



N-l 



, 1 /l- 

a log - + — 
a VI 



iV-l 



o log - 

a 2 • t-^- > log 



1 



a 



1 



1 



a 



■O(l) 



□ 



1 — a _/ ' 1 — a iV / a \ l — a 

as required. 

Claim 3.13. Let P G Bin N . Then <#(P) < H{P) + 0(1). 

Proof. Let p G (0, 1) be such that P(jfe) = (^)p fc (l - p)"~ fc . Let U be a unit set of P with size 2^ p ). 
We will partition the codewords in [JV] into three regions and bound the number of codewords from 
each region in U. The regions are: 

1. {k G [JV] | k > pN + ^/pN}, 

2. {k e[N]\k <pN - VpJV - 1} 

3. and {k G [JV] | pN - VpJV - 1 < k < pN + VpJV}- 

We will show that in any region, the number of elements from the region in U is bounded by 
0(^/pN). This will yield a total bound of \U\ = 0{y/pN). The entropy of P is H(P) = 
^ log (27reJVp(l —p)). Therefore ^/pN = 2 H ^ + °^ and the result follows. 

First we consider elements k from the first region. Let u\ be the number of words in U from 
this region. In this case 



P(k + 1) 
P{k) 



( k N +1 )p k+1 (l - p) N -^ _(N-k) p < {N-k) 



< 



JV 



N-k 



pN + ^JpN 



1 



P 



(k + 1) 1 
1 



P 



1-p 



N 

¥ ~ 1 



P 



1 — p 



1 — p 



< 1 



/pN + 1 



In a similar way to the proof of Claim [3~.12l we can conclude that u\ is bounded by 0(yJpN + 1) 
O(VpN) 

Now consider element k in the second region, similarly: 



P(k + 1) 
P(A:) 



(JV - Jfe) p JV - (jfe + 1) p 



> 



(k+1) 

N 



1-p 



> 



pN - v^JV 



1 



k + 1 

P 



1 



P 



> 1 + 



1-p 
1 



JV 



jfc + 1 



1 



P 



1-P 



/pJV 



Therefore U2, the number of elements from the second region in U, is bounded by 0(\JpN) 

Clearly ,113, the number of elements from U in the last region, is bounded by the size of the 
region. So 113 = O(^fpn). 
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Combining the above, we get 



3 



2 ^) = |t/| = £ 



as required. 



□ 



3.4 Dependence of communication on entropy 

In the previous sections we gave a scheme with performance that is exponential in the entropy. This 
scheme is error-free for some natural distributions and had positive error for general distributions. 
The next lemma shows that if we cannot find a scheme with performance that are linear in the 
entropy, then any scheme that we will find must have positive error for some distributions. 

Lemma 3.14. For every non- decreasing function L : 3ft + — > Ui + there exists a constant c = cl 
such that the following holds: If there exists (A, L(H(P)))-UCS for some A > 0, then there exists 
a (A,c- (1 + H(P)))-UCS. 

Proof We will prove the lemma for c = L(3) + 2. Let (E,D) be the (A, L(H(P))-XJCS. We will 
construct a UCS (E',D') that has the required performance. 

For every distribution P G P([N]) and real number M > 1, we introduce a notion of an M- 
concentrated version of P, denote Pm, to be: Pm(1) = 1 — 1/M + (1/M) • P(l) and Pu{i) = 
(1/M) • P(i) for i > 1. So Pm is mostly focussed on a single point and so has small entropy, but it 
provides enough variability to capture the variation of P. In what follows, we will apply (E, D) to 
the distributions Pm and Qm for an appropriate choice of M, chosen to reduce the entropy of Pm 
to be a constant and this will give the schemes E' and D'. 

The new scheme (E',D'): On input P G V([N]) and m G [N], E'(P,m) is computed as follows: 

Let M = \H(P)]. Then E'(P,m) = (M,E(P M ,m)). 

On input Q and received string y' = (M,y) the decoding D'(Q,y') = D(QM,y)- 

In what follows we argue that this is a valid zero-error UCS for uncertainty parameter A, with 

performance c • H{P). We start by proving its validity. 

Claim 3.15. For every pair P,Q £ P([N]) such that 5(P, Q) < A, and for every m G [N] we have 
D'(Q,E'(P,m)) = m. 

Proof. Fix M = \H(P)]. Since E'(P,m) = E(P M ,m) and D'(Q,(M,y)) = D(Q M ,y), it suffices 
to prove that Pm and Dm are A-close, since then we can use the correctness of (E, D) on Pm and 
Qm to conclude D(Qm, E(PM,m)) = m. Below we verify that Pm and Qm are A-close. 

First we consider m G [N] \{1}. For such m we have Pm(tti) = -hP(m) and Qm^) = jjQ( m ) 
and so P M (m)/Q M (m) = P(m)/Q(m). So | log P M (m)/Q M (m)\ = | log P(m)/Q(m)\ < A. 

Now, consider m = 1. In this case PA/(m) = ( M M ~ l ) + (jj) • P(l) and QmM = ( ^ ) + (jj) ' 
Q(l). Assume P(l) > Q(l) (the other case is similar) and so < logP(l)/Q(l) < A. On the one 
hand we have PmQ) — Qm(1) and on the other hand we have Pm(1)/Qm(1) < P(X)/Q(X) (which 
holds for every M > 0). It follows that < logP M (l)/Q M (l) < logP(l)/Q(l) < A. 

It follows that 5(Pm,Qm) < A and the claim follows. □ 

It remains to analyze the performance of the scheme. 
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Claim 3.16. For every distribution P G V([N]), we have E 



\E' 



m~ P [N] 



(P,m)| <c-H(P). 



Proof. Recall that the encoding of m G [N] is the pair (M,E(PM(m)) where M = \H(P)]. It 
follows that the first part the encoding is always of length at most 2 • (1 + log H(P)) (allowing for 
prefix free encodings and rounding up of H{P) to its ceiling). We crudely bound the above by 
2(l + ff(P)). 

We turn to the length of the second part, i.e., E(Pm(tti)). We first show that 



B[\E m ^ p[N] (P M ,m)\] < M-E 



\E 



™~p M [N] 



(PM,m) 



We then bound E 



\E 



m~p M [N] 



(P A f,m) 



by L(3) thus giving us that total expected length of the encoding E m ^ p [\E'(P,m)\] < (L(3) + 2) 
(l + H(P)) = c(l + H(P)). 

We start by showing the first step. We have 



V m ~ PM {N] [\E(P M ,m)\] 
= L^ pm [\E(P M ,m)\]+(l--^\v. 



m~p[AT] 



\E(P M ,1 



> J^m^ P [N] [\E(P M ,m) 



It follows that E m ^ p [jvj [\E(Pm, m)\) < ME m ^ p ^pv] [\E(Pm, ni)\] as asserted. 

By the performance of E on we have E m ^ [jv] [\E(Pm, m)\] < L(H(Pm))- So it suffices 
to show H{Pm) < 3. This is straightforward from the definition of Pm and the choice of M. We 
have 



Mi 



= P mH log 1/PmM 

m£[N] 

< (1- l/M)logl/P M (l) + (l/M) • P(m)logM/P{r 

me [AT] 

< 1 + 1 /M • (H(P) + log M) 

(Using P M (1) > 1/2 if M > 2 and 1 - 1/M = otherwise.) 

< 1 + 1 + log M /M 

< 3 



as required. 

The claim follows and so does the lemma. 



□ 
□ 
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